CSE 599 i : Online and Adaptive Machine Learning Winter 2018 Lecture 6 : Non - stochastic best arm identification
نویسندگان
چکیده
Example 1. Imagine that we are solving a non-convex optimization problem on some (multivariate) function f using gradient descent. Recall that gradient descent converges to local minima. Because non-convex functions may have multiple minima, we cannot guarantee that gradient descent will converge to the global minimum. To resolve this issue, we will use random restarts, the process of starting multiple instances of gradient descent from random locations and outputting the least-valued minimum found. If an instance of gradient descent is obviously not going to converge quickly to a low-valued minimum, that instance should be stopped early in favor of other, more-promising ones.
منابع مشابه
Non-stochastic Best Arm Identification and Hyperparameter Optimization
Motivated by the task of hyperparameter optimization, we introduce the non-stochastic bestarm identification problem. Within the multiarmed bandit literature, the cumulative regret objective enjoys algorithms and analyses for both the non-stochastic and stochastic settings while to the best of our knowledge, the best-arm identification framework has only been considered in the stochastic settin...
متن کاملPrivate Stochastic Multi-arm Bandits: From Theory to Practice
In this paper we study the problem of private stochastic multi-arm bandits. Our notion of privacy is the same as some of the earlier works in the general area of private online learning (Dwork et al., 2010; Jain et al., 2012; Smith and Thakurta, 2013). We design algorithms that are i) differentially private, and ii) have regret guarantees that (almost) match the regret guarantees for the best n...
متن کاملA New Fuzzy Stabilizer Based on Online Learning Algorithm for Damping of Low-Frequency Oscillations
A multi objective Honey Bee Mating Optimization (HBMO) designed by online learning mechanism is proposed in this paper to optimize the double Fuzzy-Lead-Lag (FLL) stabilizer parameters in order to improve low-frequency oscillations in a multi machine power system. The proposed double FLL stabilizer consists of a low pass filter and two fuzzy logic controllers whose parameters can be set by the ...
متن کاملBlack-Box Reductions for Parameter-free Online Learning in Banach Spaces
We introduce several new black-box reductions that significantly improve the design of adaptive and parameterfree online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime. We reduce parameter-free online learning to online exp-concave optimization, we reduce optimization in a Banach space to one-dimensional optimization, and we reduce...
متن کاملMulti-Armed Bandits on Unit Interval Graphs
An online learning problem with side information on the similarity and dissimilarity across different actions is considered. The problem is formulated as a stochastic multiarmed bandit problem with a graph-structured learning space. Each node in the graph represents an arm in the bandit problem and an edge between two nodes represents closeness in their mean rewards. It is shown that the result...
متن کامل